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ON LOCAL (7-STATISTIC PROCESSES AND THE ESTIMATION 
OF DENSITIES OF FUNCTIONS OF SEVERAL 
SAMPLE VARIABLES 

By Evarist Gine 1 and David M. Mason 2 

University of Connecticut and University of Delaware 

A notion of local {/-statistic process is introduced and central 
limit theorems in various norms are obtained for it. This involves 
the development of several inequalities for [/-processes that may be 
useful in other contexts. This local [/-statistic process is based on 
an estimator of the density of a function of several sample variables 
proposed by Frees [J. Amer. Statist. Assoc. 89 (1994) 517-525] and, 
as a consequence, uniform in bandwidth central limit theorems in the 
sup and in the L p norms are obtained for these estimators. 

1. Introduction. Let X,Xi,X2, ... be i.i.d. random variables taking val- 
ues in R, with common density function / and consider the kernel density 
estimator of / defined for t € R, 

n n 

(1.1) f n (t,h n ) = (nh n y 1 Y / K(K 1 (t-X l ))=:n- 1 Y / K hn (t-X l ), 

i=l i=l 

where {/i n }n>i is a sequence of positive constants converging to zero at the 
rate nh n — ► oo and the kernel K is an integrable (real) function of bounded 
variation satisfying f R K(x)dx = 1 (Parzen [27]). It is easy to prove that, 
subject to smoothness conditions on /, for each t G R, 

^u n {t) := VnK{f n (t, h n ) - Ef n (t, h n )} ^ d N(0, \\K\\ 2 2 f(t)), 

whereas for any choice of t\ ^ £2 the random variables y/h^u n (ti) and y/hn~u n (t2) 
are asymptotically independent. This means that \fh^u n cannot converge 
weakly to a continuous bounded process on any nontrivial subinterval of R. 
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Since the bias Ef n (t,h n ) — f(t) can always be dealt with under the usual 
conditions on K and /, this tells us that f n (t,h n ) estimates the density / 
at a much slower rate than n _1//2 . On the other hand, Frees [15] discovered 
the perhaps surprising fact that the densities f g (t) of some symmetric real 
functions g(X\, . . . ,X m ) of m > 1 i.i.d. random variables can be estimated 
at each fixed t at the rate n -1 / 2 , using the [/-statistic estimator 

r ie/ m 

where i = (h,. ..,i m ) and I™ = {(k, . . . , i m ) : 1 < ij < n,ij / i k if j / k}. 
Schick and Wefelmeyer [32] consider the special case 

m 

g(X 1 ,...,X m ) = ^2ui(Xi), 

i=l 

which however is not necessarily a symmetric function. Using convolution 
kernels, they also obtain the (in probability) rate of n -1 / 2 for the sup norm 
and the L\ norm measure of the discrepancy between the estimator and 
the density (actually, they obtain limit theorems in distribution). These 
results require smoothness conditions on the kernel, the density and certain 
conditional densities. 

Our aim is to extend the Schick and Wefelmeyer results to the Frees 
framework and with g not necessarily symmetric. Moreover, we frame our 
results uniform in bandwidth in the sup norm and the L p norm with p > 1 so 
that they can be used with adaptive bandwidth estimators. Also, we do this 
not necessarily in one dimension, but in R rf . These extensions substantially 
increase the scope of applicability of the results and, as a consequence, we 
can show how our results recover, extend and/or improve upon, previous 
work by several authors. We shall not discuss removing the bias in general 
but only in some examples and very briefly. The bias is not probabilistic and 
can always be treated by adding enough smoothness to the kernel and the 
density (see, e.g., the two references just cited and Ahmad and Fan [1]). 

We shall begin by generalizing the setup in Gine, Mason and Zaitsev 
[21] (concretely, that of Examples 1.2 and 1.3 there) and Mason [25] to 
[/-statistics. Throughout this paper, we let X, Xi, i£N, be i.i.d. random 
variables taking values in a measurable space (S, S); let g : S m i— > R rf , 1 < d < 
oo, be a measurable function; let K :H d i— ► R be an integrable measurable 
function that integrates to 1 (a "kernel"); and let < a < b < oo. Then, for 
t £ R d and A E [a, b], we introduce the local U -statistic 

(1.2) U n {t, X) := £ K Xhn (t - g{X n , . . . , X im )), 

TV. .i 
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where, here and elsewhere in this paper, for functions H : R rf i— > R and h > 0, 
the notation 

(1.3) H h {t):=h~ 1 H(t/h 1 / d ), teK d , 

is in force. The term "local" just reflects the fact that the [/-process (1.2) 
is of a special kind, namely a convolution of an approximate identity K\h n 
with an "empirical measure," in this case ((n — m)!/n!) 2i»> ^g(X t ,...,x im )i 
and therefore, for each value of t, the largest contributions to the statistic 
come from the values g(Xi 1 , . . . , AQ TO ) closest to t. 

Special cases of U n (t,\), when Xi, . . . ,X n are i.i.d. R rf valued, include 
the interpoint distance studied by Jammalamadaka and Janson [23], with 
g(x,y) = \x- y\, 

h n U n (0, 1) = - 1 T T i\ X n ~ X h\< M 

and the related short distance process studied by Eastwood and Horvath 
[12], 

-J— -J2 I{d(X k ,X i2 )< AM, 0<A<1, 
n(n-l).^ 

where d is a distance on R rf , as well as the [/-statistic estimator of the 
density of the sum X\ + ■ ■ ■ + X m , 

71 ' — 

lcJ ft 

Our goal is to obtain central limit theorems for the following local U -statistic 
process formed from U n (t,X): 

u n ,x(t) := V^{U n (t, A) - EK Xhn {t - g(Xx, . . . , X m ))}, 

(1-4) 

t G R , A G [a, 6]. 

The case when m = 1 is a special case of the local empirical process studied 
in Mason [25]. We shall confine our attention to the case m > 2, which we 
shall soon see has a radically different asymptotic behavior than the case 
m = 1. Occasionally, we may restrict the process to t G D C R d , where -D 
may even consist of a single point. 

The limit theorems to be obtained for these processes will be in the sup 
and in the L p norms, 1 < p < oo, uniformly in A G [a, b] (precise definitions in 
the next section). The reason these results will be true will be essentially the 
same as in Frees [15], namely: (1) the process (1.4) is equivalent to its linear 
part, that is, all the terms in its Hoeffding decomposition of order higher 
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than one tend to zero in the appropriate way; (2) that this linearization, 
which is a smoothed empirical process, is equivalent to the empirical process 
without smoothing, hence independent of A and h n ; (3) that this empirical 
process, under appropriate and quite weak conditions, satisfies the central 
limit theorem in the sup norm or in the L p norms. 

The devil is in the details, and extending the Frees result from one point 
t G R, to uniformity in t G R d and in A £ [a, b] , makes for very different 
proofs and requires a substantial amount of technique, some of it new. 

The statements of the central limit theorems, along with examples, are 
collected in Section 2 and all the proofs are postponed to Section 3. In the 
process of establishing our results we shall develop some tools that should 
be of separate interest. Among them include tight bounds for the absolute 
moment of the supremum of the [/-statistic process under a uniform cover- 
ing number bound generalizing a similar bound obtained by Einmahl and 
Mason [13] and Gine and Koltchinskii [17] (see also Gine and Guillou [16]) 
for the usual empirical process. All of these results were motivated by Propo- 
sition 6.2 of Talagrand [35], which is an expectation bound for VC classes of 
sets, where VC stands for Vapnik and Gervonenkis. We also obtain moment 
and exponential inequalities for Banach space valued [/-statistics which, al- 
though not necessarily optimal, are very easy to apply and are well adapted 
to the problems treated in this article. In a sequel to this paper (Gine and 
Mason [20]), we derive the corresponding functional laws of the logarithm 
for u rh \. 

2. Main results and examples. As is usual when dealing with empirical 
processes, we take ((S x T) N , (S (8) T) N ,Pr) as the underlying probability 
space, where (S,S) is a measurable space, T = {—1,1}, T is the family of 
all subsets of T, and Pr = P N x (P') N , P a probability measure on (S,S) 
and P' the uniform distribution on T. Then the random variables Xj are 
the projections (S x T) N i— ► S, Xi(si,t±, S2, t2, ■ ■ •) = Sj, which are i.i.d. with 
law P. We will occasionally use the random variables £i(si,ii,S2?*2j ■ ■ •) = 
ti, which obviously satisfy Pr{ej = 1} = Pr{ej = — 1} = 1/2. Note that the 
random variables {Xi,Ej G N} are independent. The variables £j are 
often called Rademacher variables. Sometimes we will write X for X\. 

All the asymptotic results on the process u n ^\ in this article require the 
following key assumption. 

For each i = 1, . . . ,m, the random variable g(X\, . . . , X m ), con- 
(CD) ditionally on Xi = x, x G S, has a density /j(t, x), t G R d , which 
is jointly measurable in t and x. 

Note that, setting 

in 

(2.i) 7 = E7» 

i=i 
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the function 

(2.2) f g (t):=E[f(t,X)]/m, teH d , 

defines a density for the random variable g(Xi, . . . ,X m ). Another condition 
that we will require for different values of p G [1, oo] is 

(CDp) ll/('j :c )l|p < 00 f° r all x G 5 and \\f g \\ P < oo. 

Here and elsewhere, || • || p denotes the L p (R. d ) norm for 1 < p < oo and 
the sup norm on H d for p = oo. 

The asymptotics of the processes u ni \ will turn out to be equivalent to 
that of the processes 

1 n 

(2.3) V n (t) = —Y / (f(t,X i )-E[f(t,X l )}), t£R d , 

v n i= i 

which are empirical processes. With some abuse of notation, we say that 

(2.4) V n converges in law in L OQ (R, d ) 

if condition (CDoo) holds and the class of functions {f(t, •) :iG R^ is P-Donsker. 
We refer to Dudley [10] or van der Vaart and Wellner [37] for the definition of 
Donsker classes of functions. The condition (CDoo) is equivalent to the maps 
1 1 — ► f(t, x) and 1 1 — > Ef(t, X) being in ^°°(R rf ), the abuse of notation consists 
in replacing this last space by Loo(R rf ), which usually means something else, 
and the precise meaning of (2.4) is that linin-^oo EH(u n )* = EH(G) for ev- 
ery H :£ 00 (Tl d ) i — > R bounded and continuous, where G = {G{t) :t G R d } is 
the centered Gaussian process with the covariance of f(-,X), more precisely 
a sample continuous version, and H(v n )* is the (a.s.) smallest measurable 
function larger than or equal to H(v n ). Likewise, for 1 < p < oo, we say that 

(2.5) V n converges in law in L p (R d ) 

if the condition (CDp) holds and the L p (R d )-valued random variable f(-,X) 
satisfies the central limit theorem in this space, that is, there is a centered 
Gaussian process G with the same covariance as f(-,X) and with sample 
paths in L p (R d ), such that for every H :L p (R d ) h-> R bounded and contin- 
uous, 

lim EH(v n ) = EH(G). 

n— >oo 

Note that in each case, v n {-) is a random variable that takes values in 
L p (R d ), 1 < p < oo, although in the case p = oo, with abuse of notation: 
Ts n (') is really in £°°(R d ), the space of bounded functions on R d , and v n is 
not necessarily measurable. 

The following definition describes the type of central limit theorem we 
will prove for the process u n \ . 



G 



E. GINE AND D. M. MASON 



Definition 1. Let 1 <p < oo and assume (CD) and (CDp). The pro- 
cesses u n! x converge weakly in L p (R d ), uniformly in a < X < b, to the cen- 
tered Gaussian process G with the same covariance as f(-,X) if 

sup \\u nt x - v n \\ p -> in pr* 

Ae[a,6] 

and V n converges weakly in L p (R d ) in the sense of (2.4) for p = oo and (2.5) 
for p < oo. 

Convergence in pr* means convergence in probability of the measurable 
envelopes (the smallest dominating measurable functions). 

Note that in the case p = oo, convergence of u n ^\ to G in the sense of 
this definition implies that the processes (t,X) i— > u ni \(t) converge in law 
in £°°(R rf x [a, b]) to the Gaussian process G (de la Peha and Gine [7], 
Dudley [10], or van der Vaart and Wellner [37] for this type of convergence). 
However, our notion gives more. In fact if, for I = 1, . . . , N, we have functions 

gi : S m ' i— > H d ' such that the processes u£\ corresponding to g = gi converge 

weakly in (R dl ) , uniformly in a\ < X < bi , to a centered Gaussian process 
Gi with the same covariance as Fi(-,X), where F\ is the / corresponding to 
g = gi, then it is easy to conclude, using the obvious multivariate extension 
of Definition 1, that the vector-valued processes l?n,Ai,...,Aiv defined by 

~^n,\i,...,Xff (tl ) ■■■,t N ) = (uR (tl), • • • (tiv)), 

(2.6) 

GR dl ,...,tAr £R d ", 

converge weakly in Loo(R dl ) x ••• x L 00 (R c(jv ) uniformly in ai < Xi < &j, 
/ = 1, . . . , N, to the centered vector- valued Gaussian process defined on H dl x 
• • • x R dN , 

(2.7) G(t u . ..,t N ) = (Gi(ti), . . . , G N (t N )), 

with the same covariance/cross covariance matrix as F(-,X), where 
F{t 1 ,...,t N ,X) = (F 1 {t 1 ,X),...,F N (t N ,X)), 

(2.8) 

heK dl ,...,t N eK dN . 

2.1. Central limit theorems. We still require another definition. We say 
that a class of measurable functions J- defined on a measurable space (S,S) 
is VC-type (VC for Vapnik and Cervonenkis) with respect to an envelope 
F (meaning a measurable function F such that |/| < F for all f £ J-) if the 
covering number N(T, L,2(Q), e), defined as the smallest number of ^(Q) 
open balls of radius e required to cover T satisfies 

(2.9) N(F,L 2 (Q),s) < ( A " F " £ L2(Q) J , 0< £ <2||F|| L2(Q) , 
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for some A > 3 and v >1, for every probability measure Q on S for which 
Q(F 2 ) < oo. If (2.9) holds for J 7 , then we say that the VC class T admits 
the characteristics A and v. 

Theorem 1. Let p = oo, let (CD) and (CDoo) hold, and let K be 
bounded. Assume: 

(a) each of the classes 

(2.10) Kn := {K(h~ 1/d (y --)):y€ R d , ah n <h< bh n } 

is VC-type for a bounded envelope F n and all the classes KL n admit the same 
characteristics A and v; 

(b) the density f g of g(X\, . . . ,X m ) is bounded and the class of functions 
T '.= {f(t, •) : t € R d } is P-Donsker and the identity map 

(R d ,H)^(R d ,p) 

is uniformly continuous, where p 2 (u,v) = Yav(f(u,X) — f(v,X)); 

(c) h n — > and nh n /(l V log^lli^ll^y^/vTT^)) 2 — > oo. The processes 
u n ^\ then converge weakly in L 00 (Ji, d ), uniformly in a < A < b, to the centered 
Gaussian process with the same covariance as f(-,X). 

Remark 1. Theorem 1 has obvious applications to the construction 
of confidence bands for f g . It is formulated uniformly in a < A < b so as 
to allow the possibility for A to be replaced by an estimator A n . Suppose 
that A n = X n (Xi, . . . , X n ) is an adaptive bandwidth selector such that for 
all < e < 1 there exist < c < d < oo for which for all large enough n 

(2.11) Pr{c< \„<d} > 1-e. 

Then if assumption (a) of Theorem 1 holds for any choice of0<a<6<oo 
we can immediately conclude from (2.11) that the processes ti n - converge 

weakly in Loo(R d ) to the centered Gaussian process with the same covari- 
ance as f(-,X). The analogous remark holds for Theorems 2 and 3 below. 
For a thorough discussion of bandwidth estimators that satisfy (2.11) refer 
to Deheuvels and Mason [5]. 

Remark 2. We note that if the classes of functions Ti = {fi(t, ■) :t € 
R d }, i = l,...,m, are P-Donsker, then so is T given in (b) (e.g., The- 
orem 2.10.6 in van der Vaart and Wellner [37] applied to the function 
4>{t\, . . . ,t m ) = YaLi ti/y/rn). A sufficient condition for the class of functions 
J- being P-Donsker is that it be of VC-type, since the function / being 
jointly measurable already ensures that this class of functions is measurable 
(e.g., Definition 2.3.3, Example 2.3.5 and Theorem 2.5.2 (Pollard's CLT) 
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in van der Vaart and Wellner [37], or Dudley [10], Theorem 6.3.1). These 
two references contain many examples of classes of functions which are of 
VC-type. We single out one of them which is particularly convenient for 
condition (a) above. Let f :RwR be a function of bounded variation on 
R is the difference of two bounded nondecreasing functions). The proof 
of Lemma 22 in Nolan and Pollard [26] shows that if 

(2.12) K(x) = $>(p(x)), x £ R rf , 

where p is either a real polynomial on R d or the ath power of the absolute 
value of a real polynomial on R d , a > 0, then the class of functions 

K = {K{ 1 - 1 (t- -)):tGR d ,7>0} 

is VC-type. Moreover, since the function K(^~ 1 {t — x)) is jointly measurable, 
this class is also measurable. Most kernels of interest satisfy condition (2.12), 
and therefore, condition (a) in Theorem 1. 

Sometimes one is interested only in weak convergence in L 00 (L>) uniformly 
in a < A < b, where D is a subset of R d that may even consist of a single 
point. For instance, this is the case for the interpoint distance process, where 
K(x) = I{\x\ < 1}, S = R d and g(x,y) = x — y. The following, which is 
related to a result of Eastwood and Horvath (year?), will be a corollary 
to the proof of the above theorem. We say that a subset D of R d is star- 
shaped about if x G D implies Ax G D for all < A < 1. 

Corollary 1. Let D be a measurable bounded subset of~R d star-shaped 
about and with Vol(D) ^ 0, and set K = Lp/ Vol(-D) . Assume that g(X\ , . . . , 
X m ) has a bounded density f g , that E[f(0,X)] 2 < oo and that for all e > 0, 

(2.13) lim limsupPr*< sup \V n (u) — 17 n (0)\ > e > = 0. 

5^0 n ->oo L|u|<<5 J 

Let h n — > be such that nh n — > oo. Then, the processes \u n ^\(0) converge 
weakly in £°°((0,1]) to the process XaZ, < A < 1, where Z is standard 
normal and a 2 = Var(/(0,X)). 

It makes sense to define \u nt \(0) as zero for A = 0. With this convention, 
in force from here on, we have weak convergence in ^°°([0, 1]) in this corollary. 

Condition (2.13) is satisfied, for instance, if the class J-& := {f(t, •) : \t\ < 5} 
is P-Donsker for some 5 > 0, a condition weaker than the class T being P- 
Donsker. 

Next we state the central limit theorems in the L p norm, uniform on 
a < A < b. We need to recall the definition of Young moduli of exponential 
type. As in de la Peha and Gine [7], page 188, ^i(x) := e x — 1, but if a < 1, 
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since e xa is only convex for x > x a := ((1 — a)/ct) l l a , we take as a function 
ty a that is at 0, convex and increasing, and of the order of e x " for x large 
the following: 

(2.14) ^ a (x) :=T a (x) - aexp((l - a)/a), 

where r a {x) equals exp(x a ) if x > x a , and equals the tangent line to the 
function y = exp(x Q ) at x = x a for < x < x a . We also recall that then, for 
any nonnegative random variable £ for which Ee^/ a ^ a < oo for some a > 0, 

(2.15) ||£||* Q :=inf{c:£* Q (£/c)<l}. 

This is a (pseudo)norm and it dominates, up to constants that depend only 
on a and p, all the L p (pseudo)norms. Simple standard computations show 
that 

(2.16) Pr{£>x}<bexp{-(x/a) a } for all x > => \\£\\q, a <Ca 

for a constant C that depends only on a and b. Note also that ^~ x (n) is 
a constant times u for < u < ^f a (x a ) an d it is the 1/a-th power of the 
logarithm of u + aexp((l — a)/a) for u > ty a (x a ). 
Given a kernel K and < a < b < oo, define 

(2.17) K M :={K h :he[a,b]}U{0}. 
For any two functions / and g in Lpi S6t 

(2-18) dP p (f,g)= [ \f{t)-g{t)\?dt. 

jR d 



Theorem 2. Let2<p<oo and let (CD) and (CDp) hold. Assume: 

(a) the kernel K is in L p (R rf ) and 

(2.19) f\^ m (N(IC [aM ,d p ,e))de<^; 

J 

(b) the sum of conditional densities f defined in (2.1) satisfies 

lim t 2 Pr{||/(-,X) - Ej(;X)\\ p >t} = 

(2.20) 

and / [E(f(u,X)- Ef(u,X)f] p/2 du<oo; 

JR, d 

(c) h n — > and nh^ p 



oo. 



Then, the processes u nt \ converge weakly in L p (R, d ), uniformly in a < \<b, 
to the centered Gaussian process with the same covariance as f(-,X). 
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The two conditions in (2.20) are implied by the stronger but sometimes 
more convenient conditions EWf^^X) — Ef,i{-,X)\\ 2 < oo for i = l,...,m. 
This follows immediately from Minkowski's inequality for integrals (e.g., 
Folland [14], page 194). Also, note that for p = 2 the first condition in (2.20) 
is superfluous. 

We should remark here that the conditions (2.20) are precisely the neces- 
sary and sufficient conditions for the L p (R d ) -valued random variable f(-,X) 
to satisfy the central limit theorem, that is, for the processes V n to converge 
in law to G in L p (R d ), p > 2 (Pisier and Zinn [29]; see also Araujo and Gine 
[2], pages 206-207). 

For 1 < p < 2 we need K 2 and f g to satisfy certain moment assumptions, 
and for this it will be convenient to have the following notation: for s > 0, 
p > 1, we define the Borel measure fi s on ~R d , L p (/j, s ) and d P)S as 

dfi s (t) = (1 + \t\) s dt, L p (p s ) = L p (R d ,B,fi s ), 

(2.21) . 

d P ,s(f,g) = 11/ - g\\L p (ij, s ), 

the latter for functions f,g€ L p (/i s ). 

Theorem 3. Let l<p<2 and let (CD) and (CDp) hold. Assume K 2 
and f g are in Li(pL s ) for some s > d{2 —p)/p. Assume also: 

(a) K is in L p (Tl d ) and 

(2.22) f *ifjN{K {aA , d p V d 2 ,s,e)) de < oo; 

J 

(b) the sum f of conditional densities satisfies 

(2.23) / [E(J(t,X)-Ej(t,X)) 2 ] p/2 dt<^; 

(c) h ri — > and nh n — ► cxd. 

Then, the processes u nt \ converge weakly in L p (R d ), uniformly in a < \ <b, 
to the centered Gaussian process with the same covariance as f(-,X). 

For p = 1, condition (2.23) is equivalent to f[Ej 2 (t, X)] 1 / 2 dt < oo. 

As above, we should also mention that condition (2.23) is precisely the 
necessary and sufficient condition for the L p (R d )-valued random variable 
f(-,X) to satisfy the central limit theorem, that is, for the processes V n to 
converge in law to G in L p (R d ), l<p<2 (Vakhania [36] and Jain [22]; see 
also Araujo and Gine [2], pages 206-207). 

Remark 3 . All garden variety kernels satisfy the entropy assumptions of 
Theorems 2 and 3. In fact, many such kernels are of the form K{x) = ^(|x|), 
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where VP is a function of bounded variation denned on [0, oo). To see this, 
assume that for bounded nondecreasing functions M and N we can write 
= P — N . Further, assume that 

poo rco 

/ r d ~ 1 \M(r)\ dr = Mq < oo and / r d_1 |iV(r)| dr = N < oo. 
Jo Jo 

Define the class of functions V = {r i— ► ^"^(rA 1 /'*) : A > 1}. We claim that 
this class satisfies for some A\ > 0, 

(2.24) N(V,di,e) < Aie~ l , 0<e<l. 
To verify this choose 1 < A < fi. We see that 

/■oo 

< / r d_1 [Af(r/i 1 / d ) -M(rX l ' d )}dr 

r d - 1 [N{rn l ' d ) - N{r\ l l d )}dr 

= (M + iV )(A" 1 - ii- 1 ) =: C(X^ - fx' 1 ). 

Now as in the proof in Example 1.2 of Gine, Mason and Zaitsev [21] choose 
open balls with centers at A& = C/(C — ke) for k = 0, . . . , ko where ko is the 
largest integer strictly less than C/s. This shows (2.24). 

Now consider the class of functions on R rf given by K. = {x *—> K(X 1 ^ d x) : 
A > 1}, where K{X l / d x) = ^>{X l / d \x\). Since by changing to polar coordinates 

/•oo 

\K(X 1/d x) - K(^ d x)\ dx = C d r d - l \^{rX l/d ) - ^(r/// d )| dr, 

R d Jo 

where C d = dvr d / 2 / r ( 1 + rf /2), we see from ( 2 - 24 ) tnat for some B i > 

(2.25) N(Kx,di,e) < B^ 1 , 0<e<l. 

From this result along with boundedness of K we readily get that for all 
p > 1 there is a B p > such that 

(2.26) N(JC x ,dp,£) <B p e~ p , 0<e<l. 

It then follows easily that the class of functions /Cr w formed from a kernel 
of the form K(x) = ^(\x\) obeying the above conditions satisfies the en- 
tropy condition of Theorems 2 and, if K has bounded support or decreases 
exponentially in a positive power of \x\, that of Theorem 3 as well. Some 
commonly used kernels defined on R of this form are (1) K(u) = l{u £ 
[-1/2,1/2]}, (2) A-(u) = ±exp(-M), (3) K{u) = j= exp(- U 2 /2), and (4) 

K{u) = |(1 — x 2 )+. See, for instance, Devroye [8]. 
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2.2. Examples. In this subsection we show how the previous theorems 
apply to a few instances of estimation of the density of a function of several 
variables considered in the literature, as well as to the interpoint distance. 



Example 1 (Linear combinations). (Frees [15], Schick and Wefelmeyer [32]) 
Suppose that 



(2.27) 



g(x\, . . . , Xm) — ^ Uj(Xj), Xi G S, 



i=l 



for measurable functions ui,...,u m from S to R rf such that the random 
variable Ui(X) has a density /j for each i = 1, . . . ,m. Then (CD) holds with 
fi(t,x) = fi(t - Ui(x)) and f i = f 1 *---* f i _ 1 * f i+1 *■■■* f m . Thus, 

m 

J(t,x)=Y,kt-Ui(x)), teR d ,xeS. 

8=1 



The process u nj x is then given by 

■ {n — m)\ 



u n ,x(t) = Vn 



ni 



E 



(2.28) 



r-1 



EK xhn [t-Y^u r {X r 



r-l 



We will discuss some conditions under which the central limit theorems in 
L p (R d ), p£ [l,oo], uniform in A, given above apply in this situation. 



Let d = 1. If / is of bounded variation on R then so is the convolution of 
/ with any density, as is easy to check (in fact, more is true; see, e.g., Schick 
and Wefelemeyer [33], Lemma 1). Hence /,■ is of bounded variation for each 
j if at least two of the densities fi are. So, assuming the densities /j are of 
bounded variation, then the classes T% = {fi(t — •) :t £ R rf } are of VC type by 
Lemma 22 of Nolan and Pollard [26], as mentioned in connection with (2.12). 
Also, since the map (t,x) t— > fi(t — x) is jointly measurable, these classes 
are Q-Donsker for every Q (e.g., Dudley [10], Theorem 6.3.1, page 208). 
Hence, Ti is P-Donsker for each i = 1, . . . , m. As observed, for example, in 
the proof of Lemma 8, Schick and Wefelmeyer [34], if / is a function of 
bounded variation on R, with /(— oo+) = and Tf is the total variation of 
its right continuous modification /(•+), then by Fubini's theorem applied 
to the product of Lebesgue measure with the measure whose cumulative 
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distribution function is /(■+), / \f{x + s) — f(x)\ dx < C\s\ for all s £ R. 
With f = fi this gives 

EQ^X) - Ji(s,X)) 2 = J (fi(t -u)- fi(s - u)) 2 fi{u) du 

<2r /i ||/ < || 0O ||/ i ||i|t- fl |. 

Hence, since this holds for every i = 1, . . . , m, the identity map (R, | • |) •— > 
(R, p) is uniformly continuous. So, if the densities fi are of bounded variation 
on R, then condition (b) in Theorem 1 is satisfied (note that f g is of bounded 
variation, hence bounded). 

Let d > 1 . Finding interesting conditions on the densities fi in order for 
condition (b) in Theorem 1 in the CLT to be satisfied is a little more cum- 
bersome in this case. Here are some conditions: 

(1) fi is a-H61der continuous for some < a < 1 and of bounded support, 
for each 1 < i < m. In this case one can easily check the VC property. 

(2) fi is of bounded variation on R d in the sense that it is the difference 
of the (d-dimensional) distribution functions of two positive measures (nec- 
essarily of the same mass). In this case the Donsker property follows from 
Dudley [10], Corollary 10.2.8, page 327. 

(3) The functions fi satisfy condition (2.12). This condition is directly 
imposed on convolutions because (2.12) may not be inherited by convolu- 
tion, except in particular cases. Some important examples, like full (i-variate 
normal densities, satisfy it. 

As a consequence of the above discussion, we have proved the following 
theorem that improves on Theorem 1 in Schick and Wefelmeyer [32] in that 
the convergence is uniform in A G [a, b] , the kernel is not necessarily a con- 
volution and the window sizes h n are allowed to decrease at a smaller rate. 
Their result includes the bias part whereas ours does not, and we discuss 
this immediately below. 

Theorem 4. Let g be defined by (2.27) and let K satisfy condition 
(2.12). If d = 1 assume that the densities fi ofui(X) are of bounded variation 
and if d > 1 assume that fi satisfy (1) or (2) or that fi satisfy (3) in the 
previous paragraph. Let h n — > and nh n /(log h^ 1 ) 2 — > oo. Then the processes 
u n\ defined by (2.28) converge weakly in Loo(R d ) uniformly in a < A < 
b to the centered Gaussian process with the same covariance as f(-,x) = 

YT=ih{--u l {x)). 

Next we comment on the bias part in this theorem. What we do is stan- 
dard and can also be done for the rest of the results in this article, but we 
will refrain from doing so. Suppose that the densities fi are in C k (R d ) and 
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that their partial derivatives of order k or smaller are all bounded. Then the 
same is true for f g = fi*---* f m (this follows, e.g., from Proposition 8.10 
in Folland [14] and Young's inequalities) and therefore f g admits a Taylor 
development of the form 

fg(t + S)-f 9 (t) 

= E^ E 1Sl (tW l 1 ---5 s / + H(t,k,5)\6\ k , 

,r! . , o sl x\ ■ ■ • o Sd Xd 

r=l siH \-Sd=r 

0<s t <r 

where H is uniformly bounded by a constant times the common bound for 
the fcth partial derivatives of f g (actually, by continuity, H — > as 5 — ► 0). 
Suppose moreover that the kernel K satisfies the conditions 

J \t\ k \K(t)\dt< oo and / 'tj 1 ■ ■ -t s d d K{t) dt = for Si > 0, ^ s t < k. 

i=l 

Then, integration after change of variables and use of Taylor's formula for 
fgit + iXKY^u)- f g {t) give 

sup V^\EK Xhn (t - g{X l ,. . .,X m )) - f g (t)\ = 0{yfn~h% d ). 

\e[a,b] 

Therefore, under these extra conditions on /j and K, if nli k ^ d — > 0, the 
conclusion of the previous theorem can be modified to convergence in law 
in £°°(R d x [a, b\) of 

^(^E^„(i-f>PQ r )) -/„(*)} 

I 1 n 1 I™ \ r =l / J 

to the same centered Gaussian process. If the partial derivatives up to order 
k of fi are bounded and in L±, then f g G C mk (R d ) (e.g., by iteration in 
Proposition 8.10, Folland [14]), and the previous discussion applies with k 
replaced by mk. 

Simultaneous estimation of convolutions. Here is an immediate applica- 
tion of Theorem 4 to the simultaneous estimation of the densities of convolu- 
tions. Let / be a density of bounded variation on R. We are interested in es- 
timating the convolutions f* 2 = /*/,... , f* N , for N > 2. For m = 2,...,N, 
t m £R and a m < X m < b m introduce the estimators 

f*m(f \ \— 1 K ( tm - ^ r=1 Xir \ 

Jn l r m,A m ;- Z^l \ u ) 

\ ± n \ /K m'>"n j m \ /x m'<"n / 

and set 
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or, under extra conditions as in the previous remark, replace Ef* m (t m , X m ) 
by f* m (tm, A m ). [These estimators correspond to the functions g = g m de- 
fined by g m (xi, . . . , x m ) = x\ H + x m .] A direct application of Theorem 4 

in combination with the observations (2.6)-(2.8) gives that the vector-valued 
processes {u^ X2 (t 2 ), . . -,u^ N (t N )) converge weakly in L^R) x • • • x L^R) 
uniformly in a m < X m <b m , m = 2, . . . , N, to the centered vector- valued 
Gaussian process defined on R x • • • x R, with the same covariance/cross 
covariance matrix as 

(2f(t 2 -X),...,Nf< N ~ 1 \t N -X)). 

We note here that Schick and Wefelmeyer [32] use a variation of the Frees 
[15] local [/-statistic estimator of convolutions of densities. Their estimator 
is based on convolving kernel density estimators. Here is how their approach 
works in the case of estimating the density of X± + X 2 , where X\ and X 2 
are i.i.d. real valued with density /. Consider the kernel density estimator 

n 

J n {x) = {nh n y l ]T k(h-\x - X t )), 
i=l 

where k is of bounded variation on R. Their estimator is f n * f n and can be 
expressed, with K = k* k, as 

L*L( x ) = 4rtt K ( x ~ X j!~ X3 
-— u *v + 

where U ni h n (x) is the Frees type local [/-statistic estimator of / * f(x) de- 
fined by 

U nMn {x) = {h n n{n-l)r 1 Kih-^x-Xi-Xj)). 

The second term in the above expression for f n * f n (x) is asymptotically 
negligible and Theorem 4 applies to U n h n - This remark applies as well to 
simultaneous estimation of convolutions of densities. 

We can complement the above results for Loo(R rf ) with limit theorems 
for the L p distance. We now do this for the cases p = 1 and p = 2. 

Theorem 5. Assume K satisfies condition (a) in Theorem 2 and let 
g(xi, ■ . ■ , x m ) = YllLi u i{ x i); Ui:S^~R d be measurable, where Ui(X) has 
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density fi. Assume fi is in L2(R d ). Assume also h n — > and nh n — > oo. 
Then 

™ I /„ m I V r=l / \ r=l 

converges in law in L2(R d ) uniformly in A £ [a, 6] to £/ie centered Gaussian 
process with the same covariance as Y^Li fii.' ~ u i(Xi)). 

This theorem follows because by Young's inequality (e.g., Folland [14], 
page 240), fi £ L2 for i = l,...,m implies E J f(t,X) 2 dt < 00. As in the 
previous theorem, a little more smoothness on fi and higher-order kernels 
allow for elimination of the bias. 

We can also recover the Schick and Wefelmeyer [32], Theorem 2 on the 
CLT for the L\ norm, with weaker assumptions (note that condition (2.22) 
is vacuous if [a, b] reduces to a single point). 

If two densities belong to Li(/i s ) for some s > 0, then so does their con- 
volution as can be seen by direct computation using the trivial observation 
that (1 + \u + v\) < (1 + |n|)(l -f \v\), and it is also routine to check that if 
any finite number of densities and their squares belong to Li(fi s ), so does 
the square of their convolution. Moreover, by Lemma 1 to be proved below, 
if / and k are two nonnegative functions in L\(p s ) for some s > d, then 
/(/ * k) l l 2 {t) dt < 00. Hence Theorem 3 gives: 

Theorem 6. Let g(x±, . . . , x m ) = YaLi u i{ x i)> Ui:S^H d be measur- 
able, with Ui(X) having density fi, for i = l,...,m, and let K be a kernel on 
H d . Assume that for some s> d, K 2 , fi and f 2 are in Li(/j, s ), i = 1, . . . , m, 
and that K satisfies condition (a) in Theorem 3 for this s. Assume also 
h„ — ► and nh„ — ► 00. Then 



E \ K ^n\ t ~ E U r( X ir) ~ EK xhn (t-Y, Vr{X, 



- 1 - >> 




converges in law in Li(R d ) uniformly in A G [a,b]) to the centered Gaussian 
process with the same covariance as Y^Lifii' — u i(Xi)). 

In R, if the densities fi are bounded (they do not need to be) then the 
condition imposed on fi is simply that / \x\ 1+s f(x) dx < 00 for some 5 > 0. 

Example 2 (Distribution of sample distances). Frees [15] considers es- 
timating the density of the interpoint functional g(X\,X2) = \X\ — X2I in 
two dimensions, where Xi are i.i.d. with a density / which is bounded and 
of bounded support. We further assume that / is a-H61der continuous for 
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some < a < 1, that is, — /(^)| < C|u — w| a for all u,v & R 2 . Then, it 

is easy to see that f g , the density of 5, 

fg(t) = / / f(x\ + tcos9,x 2 + tsind)f(xi,x 2 )tdxidx 2 d9, 
Jo JR. 2 

is bounded and has bounded support. Moreover, for x = (xi,X2) and with 
R the radius of a ball around zero containing the support of /, we have, for 
the conditional densities f i of \X\ — X 2 \, 

_ _ i-2-k 

f 1 (t,x) = f 2 {t,x)= f(x 1 + tcos0,x 2 + tsm6)tde, 0<t<2R, 
Jo 

and /j(t,x) = for larger values of t. Then / = fi + f 2 satisfies 

|7(i, •) - 7i(«. 01 < 8vr«C|t - s| a + 47r||jr||oo|* - «|, 

and therefore, with T = {f(t, ■) : \t\ < 2i?} and any probability measure Q, 
N{F 1 L 2 {Q) 1 e)<C/e 1 / a which, since the class T is image admissible Suslin 
by joint measurability of /, implies that the class T is P-Donsker. So, if we 
take a kernel K satisfying (2.12), and h n — > with nh n / (\ogh~ 1 ) 2 — > 00, we 
get that the processes defined for \t\ < R, A G [a, 6], 

7 £ {^(t-|^-^|)-^Ah„(t-|^-^|)}, 

converge in law uniformly in t and A to the centered Gaussian process with 
the same covariance as 2fx(t,X). Moreover, the comments following Theo- 
rem 4 regarding replacement of (A/i n ) -1 E K ((Xhn) -1 (t — |JQ — Xj\)) by f 9 (t) 
apply here as well. 
In dimension 1, 

h{t, x) = f 2 (t, x) = f(t + x) + f(-t + x), t > 0, 

and a sufficient condition for T to be P-Donsker is that / be of bounded 
variation on R. Then, K of bounded variation on R and nh n j (log h~ 1 ) 2 — ► 00 
ensure the same CLT as above. Since 

fg( x ) = J f(y + x )f(y)dy + J f(y- x)f(y)dy 

and 

/ 9 (0) = 2 J f 2 (y)dy, 

as observed by Frees [15], we get a yfn consistent estimator of / f 2 (x)dx 
when the extra smoothness conditions to make the bias tend to zero hold. 
See also Bickel and Ritov [4] or Gine and Mason [19] and references therein 
for other ^Jn consistent estimators of / f 2 (x)dx. 
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Example 3 (Local interpoint distance processes). (Jammalamadaka and 
Janson [23], Eastwood and Horvath [12].) In Corollary 1 we consider the 
processes 

5^2'^ ^^W"! 

n 

-Pv{g(X 1 ,...,X m )e(Xh n ) 1 / d D}} 

for D star shaped about 0. Of particular interest in the literature is the case 
corresponding to m = 2, S = R d , g{X\,X2) = X2 — X\ and D the (open or 
closed) unit ball about zero for some norm in K d . In this case the densities 
fi(t, x), i = 1, 2, are, respectively, f(x — t) and f(x + t), where / is the density 
of X. So, for the local asymptotic equicontinuity condition (2.13) to hold 
we only need that the class T = {/(• + 1) : \t\ < 5} be P-Donsker for some 
6 > 0. If / is Holder continuous of order a £ (0, 1], and Q is any probability 
measure on R d , then 

J (fix + 1) - f{x + s)) 2 dQ(x) < C\t - s\ 2a , s,te R d . 

It follows as in the previous example that T is VC and measurable, hence 
Q-Donsker for every probability measure Q. Thus, Corollary 1 implies the 
following slight strengthening and generalization of Theorem 1.1 in East- 
wood and Horvath [12]. 

Theorem 7. If D is a bounded measurable subset of TL d star-shaped 
about zero and with Vol(-D) 7^0, X{ are i.i.d. random vectors in H d with 
a density f which is a-Holder continuous for some a E (0, 1], and h n — > 0, 
nh n — > 00, then the processes 



n^ 2 h n Xo\(D) 



(Xh n ) 1 / d ^ J l(AM 1/d 



0< A< 1, 



converge in law in £°°[0, 1] to the process XaZ, < A < 1, where Z is N(0, 1) 
and a 2 = 4[f f 3 (x) dx - (f f 2 (x) dx) 2 ] . 

3. Proofs. In the sequel it will be helpful to introduce the following no- 
tation and facts. For a kernel L of k > 1 variables we set 

(3-1) ui k HL) = ^lY t L (X il ,...,X ik ) 

[so, U n (t, A) = u!f l \Kxh n (t — g(-, . . . , •))) even if g is not symmetric in its 
entries]. [When L is a constant function we define Un\L) = L.] Assume now 
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that L is a function of m > 1 variables, symmetric in its entries. Then, for 
1 < k < m, the Hoeffding projections with respect to P are defined as 

(3.2) 7r k L( Xl , ...,x k ) = (5 xl -P)x---x(5 Xk -P)x P m ~ k (L) 

and ttqL = EL(X\, . . . , X m ). Then, the Hoeffding decomposition states the 
following, which is easy to check: 

m / 

m 



(3.3) U^{L) -EL = Y J [™) U r [ k) (n k L). 

k=l 



For L £ L2(P m ) this is an orthogonal decomposition and E(ir k L\X2, ■ ■ ■ , X k .) - 
for k > 1; that is, the kernels ir k L are canonical for P (or completely de- 
generate, or completely centered). Also, n k , k > 1, are nested projections, 
that is, 7T fc o ir e = ir k if k < £, and E(ir k L) 2 < E(L - EL) 2 < EL 2 . 

The function Kh(t — g(X\, . . . ,X m )) is not necessarily symmetric in its 
entries, but we can symmetrize it as 

(3.4) K h (t,x 1 ,...,x m ):= — V K h (t- g(x (Tl ,...,x (Tm )). 

Then, clearly, for each t G R^, 

U n (t,X)-EK Xhn (t-g(X 1 ,...,X m )) 

= l/M (K Xhn (t, -,...,.)) - EK Xhn (t,X l ,..., X m ). 
Moreover, by applying (3.3) to u njX (t) we get 

rn / \ 

(3.5) u n , x (t) = V^E U ui k \7r k K Xhn (t, •))• 



fc=i 



3.1. A general proposition for the CLT. Let us consider the first term in 
the expansion (3.5). Note that, by definition of / and f g , 



i=l 



miTiK Xhn (t, x)=Y^ EK Xhn (t - g(Xi Xi-i, x, X i+1 , . . . , X m )) 
l 

mEK Xhn (t-g(X 1 ,...,X m )) 
Kx hn (t - u) (/(«, x) - Ej(u, X))du 

( J(t -u,x)- Ej(t - u, X))K Xhn (u) du. 



Hence, 

^imU£\iT X K xhn {t,-)) 
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1 n r _ 

(3.6) = Y, U(t-"M- E f(t ~ u > X))K Xhn (u) du 

v n i=i J 

= Jv n {t- u)Kxh n {u) du = (!7„ * K\ hn )(t), 

which is a generalized version of the smoothed empirical process that has 
been recently investigated by several authors (e.g., Rost [31] and references 
therein) . We shall see that it controls the asymptotic behavior of the process 
u Ut \. In the next proposition we show it has the same asymptotic behavior 
as the empirical process over the class of functions {/(t, -),t G R d }. 

Proposition 1. Let 1 < p < oo. Assume: 

(i) condition (CDp) holds; 

(ii) lim^olimsup^^Pr^sup^i^ \\v n (- - u) - V n (-)\\ p > e} = for all 
e>0; 

(hi) the sequence \\v n \\* n£N, is stochastically bounded. 
Then, whenever h n — > we have 

(3.7) lim sup ||v / ^mL/W(7r 1 K Ahn (-,-))-^(-)L = mpr*. 

n ^°°a<A<fc P 

PROOF. Let ws(V n ) = sup| u | <5 \\v n (- — u) — v{-)\\ p , 5 > 0, denote the L p (R, d ) 
modulus of "continuity" of V n , which is defined because of (i) . Then it fol- 
lows by Fubini in the case p = oo and by Minkowski's inequality for integrals 
(e.g., Folland [14], page 194) in the case 1 <p < oo that 

\\v n *Kxhn -v n \\ p <w s (j? n )\\Kxh n \\i + 2\\u n \L l \K\ hn (u)\du 

J\u\>5 

<w$(v n )\\K\\ 1 +2\\v n \\ p \ \K(u)\du. 

J\u\>s/(\h n y/ d 

Now, the result follows from this, K G Li(R a! ) and (ii) and (hi), in view of 
(3.6). □ 

Corollary 2. Let 1 < p < oo. Assume (CDp) and hypothesis (b) in 
Theorem 1 for p = oo , in Theorem 2 for 2 < p < oo and in Theorem 3 for 
1 <p < 2. Then the processes V n converge weakly in L p (R, d ) to the centered 
Gaussian process G with the covariance of f(-,X). If moreover h n — > 0, then 
the limit (3.7) holds. 

Proof, (a) Case p = oo. In this case, the hypothesis of the class T being 
P-Donsker is just another way of saying that V n converges in law to G in 
Loo(R d ), and this obviously implies (hi); finally, (ii) holds by the uniform 
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continuity of the identity map (R rf , | • |) i— ► (R d ,p) together with the usual 
asymptotic equicontinuity condition (e.g., Dudley [10], pages 117-118). 

(b) Case 1 < p < oo. As mentioned in Section 2, the hypotheses (b) in 
Theorems 2 and 3 are precisely the necessary and sufficient conditions for the 
process f(-,X) to satisfy the CLT in L p , that is, for V n to converge in law to 
G in L p . Moreover, (iii) is a direct consequence of this convergence. Finally, 
condition (ii) holds by the uniform tightness implied by weak convergence 
together with the Frechet-Kolmogorov characterization of compact sets of 
L p (R d ) (see, e.g., Dunford and Schwartz [11], Theorem IV. 8. 21, page 301). 
□ 

In view of (3.5) and Corollary 2, to complete our CLT program for u n \, 
that is, the proofs of Theorems 1, 2 and 3, it only remains to show that the 
hypotheses in the statements of these theorems also imply 

(3.8) sup \\V^uM(7r k K Xhn )\\ ^0 in pr, k = 2,...,m. 

a<X<b 

Proving this constitutes the main part of our proofs, and requires new in- 
equalities for [/-processes that we develop in the next subsections. 

3.2. Inequalities for U -processes. In the next two subsections we collect 
the inequalities we need to prove the limits (3.8). First we consider the case 
of [/-processes indexed by VC classes of functions and obtain a moment 
inequality that generalizes the scope of that of Einmahl and Mason [13] 
and Gine and Koltchinskii [17] for empirical processes. Next, we consider U- 
statistics taking values on separable type 2 Banach spaces such as L p , p > 2, 
and derive exponential inequalities for them. The inequality that we get in 
this situation is particularly neat. Then, based on the method recently used 
by Gine, Latala and Zinn [18] to prove inequalities for U -statistics, we derive 
both moment and exponential inequalities for other Banach spaces, such as 
L p , 1 < p < 2. These are less clean than in the type 2 case, but are still 
usable. Neither of our exponential inequalities captures the Gaussian tail 
behavior that the statistic should have for small values of x; nevertheless, 
their application yields very strong results in the situations encountered in 
this article. 

3.3. U -processes indexed by VC classes. In this subsection we consider 
classes of measurable functions T defined on (S m ,S m ) taking values in 
[—1,1], and we assume that £ T . The object is to obtain a bound for 

E\\Un (^kf)W^ where T is of VC-type, and where we use the notation 
||^(/)||jp = supj g: p| v I'(/)| for any functional ^ defined on the class T . This 
bound will require measurability on the class T described in de la Peha 
and Gine [7], page 138: the class T should be measurable in the sense that 
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for every k = 1, . . . , m and every choice of j fc £ {—1,1}, the mapping 
/ i— ► 5Djfc ai 1> , M , i i k P m ~ k f{Xi x , . . . , Xj fe ) is measurable for the completion of 
S n . This holds, for instance, if (a) if there exists J-q countable such that this 
sup equals the sup over !Fo, or (b) if the u-algebra S is countably gener- 
ated and contains the singletons, and the class T is image admissible Suslin, 
for instance, if it is parametrized by a complete separable metric space T in 
such a way that the evaluation map (t,x±,..., x m ) \— > ft(x±, . . . , x m ) is jointly 
measurable (Dudley [10], Section 5.3; van der Vaart and Wellner [37], Sec- 
tion 2.3.1; Pollard [30], page 196). These conditions allow us to randomize by 
independent random signs and use Fubini. If either of these two conditions 
is satisfied, we say that the class T is measurable. 

The following moment inequality will be instrumental in finishing the 
proof of Theorem 1. The proof of this inequality has several points in com- 
mon with the proofs of similar inequalities for m = 1 in Einmahl and Mason 
[13] and in Gine and Koltchinskii [17]; however the present proof does not 
rely on the square root trick or on the contraction principle for Rademacher 
processes. 

Theorem 8. Let J 7 be a measurable collection of functions S m ' t—¥ R 
symmetric in their entries with an envelope function F and let P be any 
probability measure on (S,S) (with Xj i.i.d. P). Assume F is bounded by 
M > and J- is VC with respect to F with characteristics A and v, as in 
(2.9). Then for every m £ N ; A> e m , v > 1, there exist constants C\ := 
Ci(m,A,v,M) and C2 = C2(m, A,v,M) such that 

n fc / 2 i?||^(vr fc /)||^<C lC x(log- A||i?llL2 ( pm ^ fc/2 

(3.9) 



a 



k = 0, 1, . . . , m, 



assuming 

(3.10) na 2 >C 2 \og 

where a 2 is any number satisfying 

(3.11) \\P m f 2 \\r<<y 2 <P m F 2 . 

Proof. Without loss of generality we assume F < M = 1. The theorem 

is true for all m and k = by Holder's inequality, since Un\^of) = Pf 1 so 
we assume the statement to be true for all m £ N and k — 1 , for some k > 1, 
and prove it for k (and for all m £ N ). We shall omit symbols when no 
confusion is possible, so, for instance, we write || • || for || • ||^. We shall not 
keep track of constants; in particular, constants that depend on a subset of 
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m,A,v,M, will generically be denoted by C (so, the value of C may change 
from line to line). By Theorem 3.5.3 in de la Peha and Gine [7], the definition 
of 7Tfc and Jensen's inequality, 



E 



< CE 



< CE 



^2e h ■ ■ ■ £ ik (ir k f)(X h , . . . ,X ik ] 



^2 £ h ' " ' £ i k k f){X il ,...,X ik 

Jk 



Here Xi,£j are all independent, the X's have law P and the e's are random 
signs (Rademacher variables). Since for any probability measure Q on S k , 

Q { pm-k {f _ g)) 2 < Q x pm -k {f _ 5) 2 ; 

it follows from the VC property of J- that 



N(P m - k F,L 2 (Q),r)< 



A\\VP m - k F 2 \ 



L 2 (Q) 



0<T<2\\VP m - k F2\\ L2{Q) , 

t hat is, p m ~ fc jr i s VC-type with characteristics A and v and envelope 
gives, by the entropy integral for Rademacher chaos of or- 
der k (de la Peha and Gine [7], Corollary 5.1.8, upon noting that exponential 
Orlicz norms dominate L p norms up to constants), 



~ l l 2 E f 



<C 



^£i 1 ---e ik (P m h f)(X il ,...,X, i 
jk 



V\\uL k) ((P™- k m\\ / A^ui k) {P m - k F 2 )\ k ' 2 



log 



dr, 



where E e denotes expectation with respect to the Rademacher variables only. 
[To apply Corollary 5.1.8 exactly to our process we need that, for Xi, . . . , X n 
fixed, the Rademacher chaos process 

f^J2 £ h--- £ h(P m ~ k f)(x h ,...,x ik ), fef, 
j k 



be separable, and this follows by separability of the unit cube of Ft' 7 ™' for 
the Euclidean norm.] Then, by Fubini, 

(3.12) \I*\V 2 E\\uW(ir k f)\\<CB, 
where 

Vl|tf£ ((P m - fe /) 2 )ll / A^U^\P m - k F 2 )\ k l 2 



(3.13) B = E 



log- 



dr. 
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Decompose the integral B into two parts, 



(I) := E 



V\\uL k \(P m - k m\\ ( AJui k) (P m - k F 2 )\ k / 2 
log— 



dr In 



where 



/„ = I{U^\P m - k F 2 ) > 4P m F 2 } 



and 



(II):=E 



\/\\ui k) ((P m - k f) 2 



log- 



AJui k) {P m - k F 2 )\ k / 2 



drli 



so that B = (J) + (II)- The (I) term is handled by the Arcones [3] exponential 
inequality (de la Peha and Gine [7], Theorem 4.1.13), which gives 



(3.14) 



Vv{U^{P m - k F 2 ) > 4P m F 2 } 



< 4exp 



9n{P m F 2 ) 2 



2k 2 P m F i + cP m F 2 



< 4exp 



9nP m F 2 
2k 2 + c 



for a constant c that depends only on k. In the last inequality we have used 
F < 1. Since, by change of variables, 



^(F^/ A^U [ n\p m - k F 2 )\ k l 2 



log 



r/r 



<A f 1 (\ogu~ 1 ) k/2 duJu T { k) (P m - k F 2 ), 
Jo 

and P k (U r [ k \P m - k F 2 )) < p m F 2 , it follows by Holder's inequality and (3.14) 
that 



(3.15) (/)<C||F|| L2(pm) exp 
As for (//), we note 



9 n ll^ ? ll| 2 (p™) 
2(2k 2 + Cl ) 



< 



D 



n 



vW^OT / 2^||F|| i2(pm) \ fc / 2 
log i '- 



(3.16) (II) < E 



Now by integration we see that for any < c < C 



drl c n 



(\og(C/x)) k ' 2 



i - ^(Mc/x))- 1 



dx = c(log(C/c)) fc / 2 , 



which when (log (C/c)) 1 < k 1 gives the inequality 



(3.17) 



'(log(C/:r)) fc/2 dx < 2c(log(C/c)) k/2 . 
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Thus since on J° 



(3.18) 2^||F|| L2(pm) /v / ||^ fc) ((P m - fc /) 2 )ll >A>e m > e k , 

we get from (3.17), (3.18) and (3.16) that 

2A\\F\\ La(pm) \ fc / 2 l 



(II) < 2E 



^\\ui k \(p^m\\(iog 



\uk k) ((P m - k f) 2 )\\ 



Since the function y/x(— logx) k ^ 2 , < x < 1, is concave on (0,e fe ] and 
^4 > e m > e k , this last bound is by Jensen's inequality 

(3.19) <2^E\\ui k) ((P™- k m\\(log 2A ^ F Wl 2 (p-) 

V \jE\\Uk k \(P m - k f) 2 
We are going to show that there exists a C > such that 

1 



(3.20) 



B<C 



+ 



B 

k/2 



+ o 2 Mo. 



L 2 (P m ) 



fc/2 



We shall consider two cases. In case 1, £?||c4 fc) ((P m - fe /) 2 )|| < a 2 . In this 
case, since the function y / x(— logx) fc//2 , < x < 1, is increasing on (0, e~ fc ], 
we get the trivial bound from (3.15) and (3.19) that 

D ( 2A\\F\\ rr , (pm) \ k / 2 

V n V a J 

which of course implies the bound (3.20) (note that v4.||P||x, 2 (p m )/c > e m ). 

Next consider case 2, E\\u£\{P m - h f) 2 )\\ > a 2 . To handle this case we 
must bound E\\ui k) ((P rn - k f) 2 )\\. It is here that we will use the induction 
hypothesis. By Hoeffding's decomposition (3.3) we have 

(3.21) E\\ui k \{P m ~ k ff)\\ <£(*) £J||C/W(7r,(P— fe /) 2 )||. 

r=0 ^ ' 

The term corresponding to r = is simply \\P k (P m ~ k f) 2 \\ < a 2 . Consider 
now the new class 

G:={(P m - k f) 2 :feF} 
of functions of k variables. For any probability measure Q on S k , 

an d thus th e class Q is VC-type with constants A and v and envelope 
2V P m ~ k F 2 as in (2.9). Also observe that since 



|pfc(pm-fcjyl|| < ||p m / 2 || < 4cr 2 < 4|| y/pm-kp2 1 



■L 2 (P fe ) 



4IIFI 



L 2 (P m )' 
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we verify that (3.11) holds. Also we trivially see that (3.10) is satisfied, that 
is, 



4na 2 > na 2 > C 2 log 



2IIFI 



L 2 (P" 



a 



C 2 log 



4IIFI 



L 2 (P m ) 



2a 



Finally noting that A>e m >e k and 2v / P m " fe F 2 < 2, we are permitted to 
apply the induction hypothesis for l<r<fc (r = has already been dealt 
with) to get with M = 2, 

(3.22) E||^)(7r r (P m -V) 2 )||<gi(fe,A^2)n" r / 2 2aflog A||i?llL2(P ' n )X 



cr 



Since by the hypotheses (3.10) and (3.11) on a [note (3.11) and F < 1 imply 
cr 2 < 1] we have 

(3.23) -Vflog A||F|lL2(pm) V /2 < C r/ V +r < C^V, 

n r / 2 V a J 



it follows from the bounds (3.22) and (3.23) that 

'k' 
r 



(3.24) 



E\\U^(-K r (P m - k ff)\\ < Ca 2 , 1 < r < k. 



As for r = k, we randomize and use the entropy bound for Rademacher 
chaos, just as we did at the beginning of the proof, using the fact that Q is 
VC-type. This gives 



E\\Ui k \n k (P m - k f) 2 )\\ 



<C\I k \~ 1/2 E 



(3.25) 



\I k n \- l/2 Y. £ n---^{P m - k ff 



< . E 



<C 



|J*|i/a- 
B 



V\\uL k) ((p™- k fr)\\ / A Ju^)( Pm -k F 2^k/2 

l0g r J 



dr 



V 



fc/2 ' 



where .B is as defined in (3.13) and we use 

i^frv^i^iftr 1 /) 2 )!. 

From (3.15), (3.19), (3.21)-(3.25) and E\\ui k) {(P m ~ k f) 2 )\\ > a 2 we get (3.20). 
Since log[^4||F|| i2 (pm)/cr] > 1, hypothesis (3.10) on a gives 
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and therefore it follows from inequality (3.20) that with perhaps a different 
value of C, 



B<C\ 



n 



fc/2 



a 



Taking squares and solving this inequality, we get that there exists a constant 
C (easy to evaluate) such that 

b < c r^Wy-)/*)* + „ ( Xog Wh.^ " 2 

n k/2 y a 

But, again by condition (3.10) on a as in (3.23), 
(log.4||F|| i2( p m) A7) fe / 2 



fc/2 



and therefore 



B< Co log 



< Ca K < a, 



i\\ F\\ L2 (P^ k/2 
a 



which, by inequality (3.12), proves the theorem. □ 

Remark 4. An analogue of the previous theorem holds if we replace the 
function (log(A\\F\\ L ^ Q )/T) by H{\\F\\ L2 ^/t) with H an increasing regu- 
larly varying function of exponent < a < 2/m, very much as in Theorem 
3.1 in Gine and Koltchinskii [17]. 

3.4. U-statistics taking values in separable Banach spaces. 



3.4.1. The L p case, 2<p< oo. We shall begin by establishing an ex- 
ponential bound for the tail of the norm of a ^-statistic taking values in 
a separable type 2 Banach space B. Let us recall that a separable Ba- 
nach space B is of type 2 if for any finite number of points Xi € B and 
independent Rademacher variables £j (independent random signs), we have 
-^11 J27=i £ i x i\\ 2 < CJ27=i ll^ill 2 ; an d the smallest such constant C is the type 
2 constant of B. It is well known that the L p spaces are of type 2 if (and 
only if) p > 2. If Zi are independent, centered B-valued random vectors with 
a square integrable norm and B is of type 2, then 



(3.26) 



E 



i=i 



i=l 



(Araujo and Gine [2] or Ledoux and Talagrand [24]). 
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Theorem 9. Let H be a function of k variables, P-canonical, symmet- 
ric, with values in a type 2 Banach space. We have, for all x > 0, 

>x\<D exp i 



(3.27) Pr 



E ^{Xi^. ..,X, 

lK ~ J -n. 



H: ■ 



X 



where k is a number satisfying 



(3.28) 



K> SUp \\H(xi, . . . ,Xk) 

x 1: ...,x k es 



and Xq and D are constants that depend only on k and the type 2 constant 
C ofB. 

Proof. (This inequality was mentioned on page 252 of de la Peha and 
Gine [7] and its precise statement and proof were left to the reader.) Here 
is the proof. 

Let 

Sn = — i , E H(Xi 1 ,...,Xi k ), 



(n\l/2 



Kii<—<it<n 



and let 



5. 



1 



E ^-<H(X^,...,X^) 



kJ l<ii<— <ife<n 

where k is as in (3.28), X\ j) are i.i.d. with law P, and e] are i.i.d. Rademacher 

variables independent of the collection of variables {X^ }. 

By decoupling (Theorem 3.1.1 of de la Peha and Gine [7]) and convexity 
(e.g., Theorem 3.5.3 on page 140 of de la Peha and Gine [7], as it applies 
to a nonnegative, nondecreasing convex function ^), there is a constant Ck 
such that 



(3.29) 



1*2/* — C*;||||<S'nllll*2 



2/k — ™ 1 1 II w nll II *2/fc ' 

By hypercontractivity of Rademacher chaos (Khinchin's inequality for Rade- 
macher chaos), for example, Theorem 3.2.2 on page 113 of de la Peha and 
Gine (year?), for all r > 2, 

E £ \\S'S <r k ^- 2 (E e \\S'Syl\ 

and by the type 2 inequality applied one sequence {e^}, j = 1, • • • ,k, at a 
time, we get 

{E £ \\s' n \\ 2 y /2 <c kr/2 , 

where C is the type 2 constant of the Banach space, and therefore, 

EAS'JV <C kr ' 2 r kr ' 2 . 
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This inequality yields, by Taylor expansion of the exponential, that for some 
c>0, 

E £ y 2/k (c\\S' n \\)<™ 

and therefore that there exists a constant Ai = Ai(C, k) depending only on 
C and k, such that 

E e * 2/k (\\S , J/\ 1 )<l. 
Integrating with respect to the X's, 

£?* 2 /fc(ll^||/Ai)<l, that is, |||K||||* 2A <Ai, 
which, combined with (3.29), gives 

|||'Sr»||||*2/fc — := ^0, 

a constant. Of course this implies, by the definition of the Orlicz norm ^/fc) 
that there is a constant D(k,C) such that 



Pr{||5„|| >x}<De~ {x/Xo) 



2/k 



□ 



3.4.2. The L p case, 1 <p < 2. Our aim now is to obtain a useful expo- 
nential inequality for B-valued [/-statistics, when B is not necessarily a type 
2 Banach space. We will generalize to B-valued U -statistics (via decoupling 
followed by iteration) the following sharp inequality for sums of independent 
random vectors: there is a constant L < oo such that if B is a separable Ba- 
nach space, Zi, i € N, are independent mean zero random vectors taking 
values in B and r > 2, then, setting S n = Ya=i %i-> 



(3.30) 



E\\S n f<U 



r r l 2 (E\\S n \\ 2 ) r / 2 + r r E max \\ZA 

Ki<n 



This inequality was obtained by Pinelis [28], and it also follows easily from 
the sharper inequality in Gine, Latala and Zinn [18], Proposition 3.1. Next 
we extend inequality (3.30) to B-valued [/-statistics. 

Theorem 10. Let B be a separable Banach space, let H:S k i— > B be a 

(i) 

bounded P -canonical random vector symmetric in its entries and let Xi,X^ 
be i.i.d. S-valued random variables, 1 < i < n and j = 1, . . . ,k. Define k and 
Xn t° be any pair of numbers such that 



(3.31) 



K> SUp 

xi,...,x k es 



,x k ) 



and Xn > [E 



5>(At\...,*f) 

i£l k 



2\ 1/2 



30 



E. GINE AND D. M. MASON 



Then there exists a constant C depending only on k such that, for all n G N 
and r>2, 



E 



(3.32) 



E^« 



(i) 



kr K rn 



Moreover, there exists a constant Dq depending only on k such that, for all 
n G N and r>2, 



(3.33) E 



y~] H(Xi 1 , . . . ,x, 

i£l k 



< Dl[r kr ' 2 Xn + r^ k+1)r / 2 n { - k - 1)r K T + r kr K r ], 



where k and Xn o^e defined as in (3.31). 

Proof. Inequality (3.30) gives the result for k = 1. Assume the re- 
sult is true for k and for every Banach space, and let H be a function 
of k + 1 variables satisfying the conditions in the statement of the the- 
orem. Before starting the induction, we describe some simplifying nota- 
tion. We will denote by <' inequality up to a multiplicative constant C r 
with C depending only on k. Also, we will write H\ = H if the coordi- 
nates of the multi-index i G {1, . . . , k + l} n are all different, and H\ = 
otherwise, and we will drop the arguments of H or Hi, so Xa^i wm mean 
^ie{i,...,k+i}nmxS\...,X^). Finally, for Ac{l,...,k + 1}, £4 will 

mean integration with respect to the variables X- only, for r G A and i <n. 



Applying (3.30) conditionally on the variables X± , j 



E 



(3.34) 



<' £1, 



r/2[ 



E E 



1, we obtain 

2 \ r /2-| 



+ ^1, 



r r Ef~+\ max 
«fe+i 



E * 



In order to deal with the first term in (3.34), for each x\, fixed 
we consider the random variable J2i k+1 Hj(xi, . . . ,Xk,X^ + ^) as a function 
from S k into the Banach space Li2(£l, S, P; B) of B-valued random variables 
whose B norms are square integrable, with norm 



To apply the induction hypothesis to the statistic X)i 1) ...,i fc (Si fc+1 -^i) i n this 
Banach space with the norm | • I*, we first note that, if we denote by k and 
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X the corresponding quantities associated with this statistic, then 



k = max 
x!,...,x k eS 



(fc+i), 



l k+l 



< UK 



and 



E E^ 

h,—,ik W+i ' 



2 \ V 2 



Xn = ^-El,...,fc 

Hence, the induction hypothesis gives 

'• r2 /: ; ;.(/•;, 



E 



2 \ 1/2 



Xn- 



E E * 

ii,...,ik \ik+i ' 



2 \ r/2 



< r (fc+1)r/2 ^ + r^WV'V + r( fc+1 / 2 >W. 
Now, if n < r 1 / 2 , then r( fc+1 / 2 ) r n r < r ( fc+1 ) r and if n> r 1 / 2 , then r ( k + 1 / 2 ) r n r < 

r (fc+2)r/2 n fcr ) gQ ^ 

2 \ r/2 



r ^ 2 E\ k Ek+i 



(3.35) 



E E ffi 



</ r ( fe+1 ) r / 2 X ; + r ( fc+2 ) r / 2 n fcr K r + r^'V. 



Finally, we apply the induction hypothesis conditionally on the variables 
j^(k+l) ^ Q seconc l term of (3.34) by considering that term to be a [/- 
statistic with values in the Banach space ^°(B) := {(v\, . . . ,v n ) :vi 6 B}, 
with norm \(vi, . . . ,v n )\ := maxi<j< n \\vi\\. In fact, with this definition, 



max 

ik+i 



E * 



»i>---i*fc 



£ (^\...,^,^f +1) ),...,^(< ) ,...,< ) ,^ +1) )) 



and if we denote by k and Xn the corresponding parameters, we have, for 
each value of X^ k+1 ^ , j = 1, . . . , n, 



k= max ||(i?(xi,...,x fc ,xf +1) ),...,i7(a;i,...,Xfc,X ) ! l ' 

x 1: ...,x k es 



(fc+lh 



< K 
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Xr, 



Ei k max 

ifc+i 



E * 



2\ 1/2 



So, induction gives 



r r Ei % max 



E Hi 



ti,...,ifc 



</ r r [r fcr / 2 x; + r (*+iWV*- 1 ) r 7s r + r kr K r ] 



< K r r r (fe+2)r/2 n fer + r (fc+3)r/2 n (fc-l)r + r (fc+l)rj_ 

By considering the cases n < r 1 / 2 and n > r 1 / 2 we see that 

r (fe+3)r/2 n (fc-l)r < max ( r (H2)r/2 r} k ) r (k+l)r\ 
from which it follows that 



(3.36) r r E\ r ,,k max 



E h 



</ K r [ r ( fe + 2 ) r /2 n fe'- + r (fc+l)ri 



Now, the first part of the theorem follows by substituting the estimates (3.35) 
and (3.36) into inequality (3.34). The proof of the theorem is completed 
by noting that (3.33) follows from (3.32) via de la Peha's [6] decoupling 
inequality (see Theorem 3.1.1 in de la Peha and Gine [7]). □ 

We note that this theorem could have been proved, with only formal 
changes, for H\ depending on the subindices i. Gine, Latala and Zinn, un- 
published, have a moment inequality for B-valued canonical [/-statistics of 
order 2 without boundedness assumptions that contains (3.32) for k = 2. 
Our proof is inspired by theirs. 

We shall apply (3.33) to the special case when B = L p (R d ), l<p<2, 
and H(x\, . . . ,Xk) = Tr^K^-, x\, . . . , x^), to obtain the following. Recall the 
definition (2.21) of /i s . 

Corollary 3. Let K, f g and s > satisfy the conditions of Theorem 
3 for 1 < p < 2 . Then there exist 7 > and D > such for all n > 1 and 
x > 0, 

i/fe> 



f v p j \\Zl> t 'KkK h (;X il ,...,X ik ) 

1 ' ' I Dn^/Vh 
where \\K\\ Pt2jS = \\K\\ p V \\K\\ L2 ^ g) . 



> x > < 7 exp 



\K\ 



p,2,s 



For the proof, first, note the following easy bound for ||||7rfeif /i||p||oo) 
namely, we can choose some A\ > 0, so that 



(3.38) K:=A 1 \\K\\ p /h 



i-i/ 'p 



> sup \\ir k K h (-,xi,...,x k ) 

x lt ...,x k £S 
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To see this, just observe that nkKh(t,xi, . . . ,Xk) is a linear combination of 
terms of the form hi(t, , . . . , Xi e ) with . . . , ig) € 1^ and < I < k, where 
h (t) = P m K h and, for £> 1, 

h e (t,x h ,. . .,x k ) = P m ~ e K h (t,xi, . . .,x e ,X i+ i, . ..,X m ) 

= P m -*— V K h (t-g a (x 1 ,...,x e ,X t+1 ,...,X m )) 

and g„(yi, . . . , y m ) = g(y ai y am ). Then Jensen's inequality for expecta- 
tions and averages and a substitution yield 

h e (t,x 1 ,...,x l )\Pdt< [ \K h (t)\ p dt = h 1 -? [ \K{t)\ p dt. 



We shall also need a good bound for E\\ J2ik 7TkKh(-, X^, . . . ,Xi k )\\ p in 
order to estimate \n m (3.31). Such a bound will be based on the following 
lemma, which is an extension of ideas and results in Chapter 7 in Devroye 
[8] and Section 3 of Devroye [9] . 

Lemma 1. Let 1 <p < 2. // / and k are two nonnegative functions on 
H d such that f, k £ L\([i s ) for some s > d(2 — p)/p, then, for any < b < oo, 

sup 

h£ (0,6] 



(^*/r/ 2 (y)^<C(||/|U lW ||A;|U l(Ms) f/ 2 
for some constant C that depends only on d, p and s. 



Proof. Let q = spj (2 — p) and note that v(y) := (l + \y\)~ g is integrable. 
Then, by Jensen with respect to the probability measure v(y) we 
have 

(kh * f) p/2 (y) dy = J v{y)-\k h * f) p/2 (y)v(y) dy 

\P/2 

(l + \y\nk h *f)(y)dyj . 

Since (l + \u + v\) < (1 + |it|)(l + \v\) and l + \hu\ < (1 V/t)(l + \u\) for h > 0, 
we also have 

J{l + \y\Y(k h *f)(y)dy 

< I f(l + \y-x\Y(l + \x\) s k h (y-x)f(x)dxdy 



< IMIi 



<(lVb s / d )\\f\\ Ll{ ^\k\\ Ll{lMs y 
The lemma follows from these inequalities. □ 



•34 



E. GINE AND D. M. MASON 



LEMMA 2. // K £ L 2 (R d ) and h>0, then, for l<k<m and t £ H d , 
(3.39) 



EtoX&Xr X t )f< { l K ^"' )(t) 



h 

Let now K, f g and s > satisfy the conditions of Theorem 3 for 1 < p < 2, 
and let < b < oo. Then there exists C < oo, such that, for 1 < k < m and 
allO<h<b, 



(3.40) 



E 



Y /2 < C\\K\\ L2M n^ 



Proof. Using the fact that tt^ is a projection in L2(Pr), and applying 
convexity of the square function to the symmetrization of K^, we have 

E(n k K h (t, X U ..., X k )f < ^EK* ~ 9 ^ " ' " ' Xm) 



h 2 ' 

\({K 2 ) h * f g )(t), 



that is, (3.39). Next, by orthogonality, 

2 



E (^2 *kK h (; X h ,..., X ik )^j = J^^E(iT k K h (t, X k )f. 

Then the Minkowski inequality for integrals (e.g., Folland [14], page 194) 
and the above results yield 



E 



< 



< 



2\ 1/2 



f / \ 2x1/2 

E\J2ir k K h (;X il ,...,X ik )j j 



m \ n k\ 1/2 



h 



((K 2 ) h *f g y/ 2 (t)dt 



i/p 



The result (3.40) follows now from Lemma 1. □ 

Proof of Corollary 3. It follows from Lemma 2 and from (3.38) 
that we can take, for some A 2 > 0, 



X< 



:=A 2 n k / 2 \\K\\ p ^ s /Vh and k:= A 1 \\K\\ p ^ s /h 1 - 1 l p . 
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Assume that < h < 1 and r > 2. Then we have the bounds 

r k/2 Xn = A 2 \\K\\ p ^ s r k l 2 n k l 2 /Vh < A 2 \\K\\ p ^ s r k n k ' 1 / Vh, 
r {k+i)/2 n k-i K = A ^ K ^ sr (k+i)/2 n k-i/ h i-i/p < AtWKWp^srW 

and 



r K = r 



) A 1 \\K\\ P!2tS /h 1 - 1 /P<A 1 \\K\\ P!2 . 



s r n 



/Vh. 



By (3.33), this says that for some D > 0, for all r > 2, 



E 



' < ^2i^kK h {-,Xi 11 . ..,Xi 



< {D\\K\\ p>%s r k n k - l /Vhy 



and thus 
E 



y^^kKfi^x^,. . . ,x, 

Jk 



Ik . 



r/k 



< E 



'^2^kKh{-,X il , ...,X ii 
jk 



r\ 1/k 



< (D\\K\\ pAa r k n k - 1 /Vh) r/h 

= (3 k D\\K\\ Pj2>s n k - 1 /Vh) r/k r r /3 r . 

The same bound holds for r = 1. From this moment bound we easily get 
that for all k > 2, 



Eexp 



E/fc n k K h {-, X h ,...,X ik )\\ p 



3*D||if|| Pl 2 )S n*- 1 /v^ 



i/fc 



< 



oo r 

r 



E- — - =: 7 < oo. 
3 r r! ' 



r=0 



The desired result (3.37) follows now from Markov's inequality and a re- 
naming of D. □ 

Notice for future use that by (3.37) we get via (2.16) that for some C > 0, 



(3.41) 



\/™IIEi* 7r k K h (-,X il ,...,X, 



ik)\\p 



n' x 



< 



'\lk 



CD\\K\\ pX , 
\fnh 



3.5. Completion of the proofs of the CLT. We are now ready to prove 
(3.8) for 1 < p < oo, which will complete the proofs of Theorems 1, 2 and 3 
(Section 3.1). We begin with the case p = oo. 

Proof of Theorem 1. Let 

K g , n := {K((y - g(-))/h^ d ) : y € H d , ah n <h< bh n }. 

Since N(JC gjn , L 2 (Q), e) = N()C n , L 2 (Q o g^ 1 ), e) for every probability mea- 
sure Q on R m , it follows from condition (a) in Theorem 1 that the classes 
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lCg. n are VC-type with the same A and v and with envelopes F n (g). More- 
over, since the map 

(y,h,xi,..., x m ) i-> K((y - g(xi x m ))/h 1/d ) 

is jointly measurable, these classes are image admissible Suslin, hence mea- 
surable. So, we can apply Theorem 8 to them. Since by Minkowski's inequal- 
ity 

(\h n ) 2 EK 2 Xhn (y,X u ...,X m )<EK 2 ( y = ) 

= / K 2 (^^)/ 9 M ( iu;<||/ 9 || 00 ||K||26/ ln , 

we can take a 2 = Ch n with C = ^ll/gllooll-^lll/IKfc-^lloo- Hence, Theorem 8 
gives that there is a constant C k such that, for all n and k, 



i?|sup 



~ A^n TTfei^Afen (*) Xil 1 ■ ■ ■ i^ifc, 



n k/2 

Jk 



:t£~R d ,a< \<b\ 



<C k Vh n [ly\og(A\\F n \\ L2ifg) /Vh n )]. 
Hence, if nh n /[l V log(A\\F n \\ L2 ^f g ^/\/Ti^ )] 2 — ► oo, then for all k = 2, . . . ,m, 

(3.42) £ sup ||VnC/A HTrfc^AhJIloo < — 1/2 >0, 

a<X<b n^-^^hr! 

proving (3.8) in the case. □ 

Proof of Corollary 1. By the Hoeffding decomposition (3.5), it 
suffices to show [in analogy with Proposition 1 and (3.8)] that 

(3.43) sup lA^mC/W (7Ti^ A/ln (0, O)-^n(O)) HO in pr 

0<A<1 

and 

(3.44) sup XV^\U r [ k \7r k K Xhn (0, -))h0, k = 2,...,m. 

0<A<1 

As in the proof of Proposition 1, (3.43) reduces to proving 



sup A / \v n {-u) - v n {ti)\\Kxh n {u)\ du ->• 
0<A<1 J 

for i = 1, . . . , m. Let M be such that -D is contained in the ball of radius M 
about the origin in K d . Since K Xhn ( u ) = H u e (A/i n ) 1 / d J D/(A/i n Vol(D))} is 
zero for |u| > (A/i n ) 1//rf M, we have 

sup A / \V n (—u)—V n (0)\\K\h n (u)\du< sup |F n (-u) - V n (0)\, 



0<A<1 



\u\<h 1 J d M 



LOCAL [/-STATISTIC PROCESS 



37 



which tends to zero in probability by condition (2.13), proving (3.43). 
For each n, the class of functions {I\h n D ■ < A < 1} is VC because it is 
linearly ordered (these are the simplest VC classes), and its envelope is 
contained in the class. The same is true for the class {Xh n K \h n (0, ■,...,•)/ 
||7rfcK(0, -)||oo '■ < A < 1}. Then the logarithm in the bound (3.9) in 
Theorem 8 is simply a constant, and that theorem gives 

n k/ \, W(n Hn rr-E sup {XU^^K xhn (0, v, 0)1 < Ca 

\\ir k K (U, -JHoo o<A<i 

as long as no 1 > C , for fixed constants C and C and for k = 2, . . . , m, where 
we can take 

2 _ 1 1 fg 1 1 oo 

° "VolOD)||7T fe Z(0, -)lllo 

because, by change of variables, 

sup E[X 2 hlKl h J-g(X 1 ,...,X m )/(\h n ) 1 / d )]<hJf g \\ 00 /\ol(D). 

0<A<1 

Since nh n — > oo by hypothesis, the condition no 2 > C is satisfied, and there- 
fore, we can apply Theorem 8 and obtain 

sup \^i\U^\<K k K Xhn ^ -,...,•))!< ~ % 1/2 - 

0<A<1 n (k-l)/2 h V 2 
for all 2 < k < m, proving (3.44). □ 

To finish the proof of Theorems 2 and 3 is more complicated because we 
must deal with a mixed norm, the sup over A of an L p norm. The proof 
will consist in showing that the exponential inequalities from the above 
subsections lead to an entropy bound of the random variables in (3.8). 

Given a kernel L in L p (R d ), 1 < p < oo, and h > 0, let Lh{t,x\, . . . ,x m ) 
be the symmetrization of L^it — g(x\, . . . , x m )), as in (3.4). We observe that, 
just as in (3.38), there is a constant c < oo that depends only on k such that 

_ cll^ll 
(3-45) IIIKfcLfellpHoo < 77— fffp- i k = l,...,m. 

Because of (3.45), it makes sense to define 



(3.46) X 



L,n •" 



l(p-i)/p 

I in 



^2 7T kLh n (-,X il ,. ..,Xi 



i£l k 



Proof of (3.8) for 2 < p < oo. For p>2,L p is of type 2 and Theorem 
9 then gives that, for a constant C that depends only on k and p, for all 

x>0, 

(3.47) Pr{ x L ,„>.}< D ex P {-(^) 2/ '}, 
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and then, by (2.16), we have for another constant C that depends on k and 
P, 

(3-48) l|X L ,n|k 2/fc <C\\L\\ P , 

where Vt^/fc is the Young modulus of exponential type defined in (2.14) [but 
with ^i(x) = e x — 1] and || • ||* 2/fc is the associated (pseudo)norm (2.15). 
Applying (3.48) to L = K\ — Ky, we obtain 

(3.49) ||X^ Ajn - Xx A ,,n|k 2/fc < \\X-K x -K x ,,n\\* 3 / k < C\\K\ - K\> \\ p . 

Then this bound and (3.45) allow us to apply the usual entropy integral 
bound, for example, in the version given in de la Peha and Gine [7], Corollary 
5.1.5, and conclude that for some constant D, keeping in mind that G fc[ a ,b] > 

rC\\K\\ p 

%/ k (N(JC [a)b] ,d p ,e))de, 



(3.50) 



sup X^ A 



A," 

a<A<6 



*2/fe 



where /Cr a w is defined by (2.17) and d p is the L p (R d ) distance defined in 
(2.18) (technically, this only holds for any separable version of the process 
Xy Jin , but we see in a remark below that this process itself is separable). 
Now, since up to constants Vl/ Q is increasing in a, hypothesis (a) in Theorem 
2 implies that this integral is finite for all 2 < k < m. Taking into account 
that the Orlicz distances of exponential type dominate (up to constants) the 
L r (Pr) distances, inequality (3.50) implies that for all 2 < k < m, 

E ^l^> ( ^)ll,s — _£^ , 

for some constant C that depends on k, p, a and b. The condition n\in V — > 
cxd implies that this expectation tends to zero for 2 < k < m, proving (3.8) 
for p > 2. □ 

Proof of (3.8) for 1 < p < 2. Since f g and K 2 are in Li(fj, s ) for some 
s > d(2 — p) /p, Lemma 1 applies and therefore Corollary 3 and inequality 
(3.41) hold. Now, the proof of (3.8) and subsequently Theorem 3 follow 
exactly as the proof of Theorem 2, but using Corollary 3 and its consequence 
(3.41) instead of Theorem 9 and its consequence for Xi jn . These give 

||X£, n ||* 1/4 < Cn^hl/^WLWp^ 

instead of (3.48), which yields, as in the previous proof, for some constant 
D, 



sup X Kx , 

a<X<b 



,,„ , u fC\\K\\ p/2 , s 

< Dn^hH 2 ' 1 ^ I V-f k (N(lC laib] ,d p Vd 2>s ,e))de 
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instead of (3.50). Hence, for some constant C, 



1 



< C\\K\\p,2,. , n 



if re/i n — ► oo. □ 



Remark 5 (Separability of the process Xjf A|R ). In the previous subsec- 
tion technically we must make sure that 



sup X Kxt7l 
\e[a,b] 



: SUP 



maxXftr. in 



Aec 



: C finite, C C [a, 6] 



in order to ensure that the entropy bound applies exactly to the process 
X-K x , n and not to a modification thereof. For this, by standard arguments 
(basically the monotone convergence theorem), it suffices that there exist 
D C [a, b] countable such that 



(3.51) 



SUp ^K x ,n = SUp X^ Ajn 
Ae[o,6] AeD 



a.s. 



Let D denote the rationals in [a,b]. To show (3.51) it suffices to prove that, 
with probability 1, for each A £ [a, b] and any sequence A m in D such that 
A m — ► A we have lim m _»oo ^-K x ~K x ,n = 0. In turn to verify this it is enough 
to check that 



(3.52) 



lim 

m— >oo ' 



R' 



^kKx mhn - K\h n (t, X h ,...,X ik ] 



i/p 



dt 



0. 



Now, each of these L p norms is bounded by a finite number of terms of the 
form 



J^ d \E(K Xmhn -K Xhn )(t-V)\Pdt 



i/p 



where V is a random variable. Observe that 

i/p 



R'' 



\E(K Xmhn -K Xhn )(t-V)\Vdt 



< (e j^ d \(K Xmhn -K Xhn )(t-V)\Pdt 



E 



R'< 



-K 



t-V 



t-V 



{\ m hn) 1 / d J Xhn V(A/ in ) 1 /rf 



l/p 



K 



p \ i/p 
dt) , 
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which by the change of variables inside the integral u = (t — V)/{X rn h n ) 1 ' d 
is equal to 



Xm^n 



R' 



X m h n ^ ^ Xh n 



X 



R'' 



K(u) 



X 



-K 



A \ 1 / d 

T 



u 



du 



p \ i/p 
du 

Up 



{XmK) llp - 1 . 



Now since K G L p (R d ) we have lirn^i J R(i | AT(u) — ^(714) | p du = 0, and this, 
in turn, implies that 



lim 

m— >oo 



Rrf 



A 



A \ 1 / d 



U 



du = 0, 



which gives (3.51) 
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